klotz: prompt engineering* + large language models*

0 bookmark(s) - Sort by: Date ↓ / Title / - Bookmarks from other users for this tag

  1. Unusually detailed post explains how OpenAI handles the Codex agent loop. The article dives into the technical aspects of OpenAI's Codex CLI coding agent, including the agent loop, prompt construction, caching, and context window management.

    The article details how their Codex CLI coding agent functions. OpenAI engineer Michael Bolin explains the "agent loop" – the process by which the AI receives user input, generates code, runs tests, and iterates with human supervision.

    * **Agent Loop Mechanics:** The agent builds prompts with prioritized components (system, developer, user, assistant) and sends them to OpenAI’s Responses API.
    * **Prompt Management:** The system handles growing prompt lengths (quadratic growth) through caching, compaction, and a stateless API design (allowing for "Zero Data Retention"). Cache misses can significantly impact performance.
    * **Context Window:** Codex automatically compacts conversations to stay within the AI model's context window.
    * **Open Source Focus:** OpenAI open-sources the CLI client for Codex, unlike ChatGPT, suggesting a different approach to development and transparency for coding tools.
    * **Challenges Acknowledged:** The article doesn't shy away from the engineering challenges, like performance issues and bugs encountered during development.
    * **Future Coverage:** Bolin plans to release further posts detailing the CLI’s architecture, tool implementation, and sandboxing model.
  2. Repeating the input prompt improves performance for popular LLMs (Gemini, GPT, Claude, and Deepseek) without increasing the number of generated tokens or latency, when not using reasoning.
  3. Guides, papers, lessons, notebooks and resources for prompt engineering, context engineering, RAG, and AI Agents.
  4. A comprehensive overview of the current state of Multi-Concept Prompting (MCP), including advancements, challenges, and future directions.
  5. LLM EvalKit is a streamlined framework that helps developers design, test, and refine prompt‑engineering pipelines for Large Language Models (LLMs). It encompasses prompt management, dataset handling, evaluation, and automated optimization, all wrapped in a Streamlit web UI.

    Key capabilities:

    | Stage | What it does | Typical workflow |
    |-------|-------------|------------------|
    | **Prompt Management** | Create, edit, version, and test prompts (name, text, model, system instructions). | Define a prompt, load/edit existing ones, run quick generation tests, and maintain version history. |
    | **Dataset Creation** | Organize data for evaluation. Loads CSV, JSON, JSONL files into GCS buckets. | Create dataset folders, upload files, preview items. |
    | **Evaluation** | Run model‑based or human‑in‑the‑loop metrics; compare outcomes across prompt versions. | Choose prompt + dataset, generate responses, score with metrics like “question‑answering‑quality”, save baseline results to a leaderboard. |
    | **Optimization** | Leveraging Vertex AI’s prompt‑optimization job to automatically search for better prompts. | Configure job (model, dataset, prompt), launch, and monitor training in Vertex AI console. |
    | **Results & Records** | Visualize optimization outcomes, compare versions, and maintain a record of performance over time. | View leaderboard, select best optimized prompt, paste new instructions, re‑evaluate, and track progress. |

    **Getting Started**

    1. Clone the repo, set up a virtual environment, install dependencies, and run `streamlit run index.py`.
    2. Configure `src/.env` with `BUCKET_NAME` and `PROJECT_ID`.
    3. Use the UI to create/edit prompts, datasets, and launch evaluations/optimizations as described in the tutorial steps.

    **Token Use‑Case**

    - **Prompt**: “Problem: {{query}}nImage: {{image}} @@@image/jpegnAnswer: {{target}}”
    - **Example input JSON**: query, choices, image URL, target answer.
    - **Model**: `gemini-2.0-flash-001`.

    **License** – Apache 2.0.
  6. This article explores how prompt engineering can be used to improve time-series analysis with Large Language Models (LLMs), covering core strategies, preprocessing, anomaly detection, and feature engineering. It provides practical prompts and examples for various tasks.
  7. This article explores strategies for effectively curating and managing the context that powers AI agents, discussing the shift from prompt engineering to context engineering and techniques for optimizing context usage in LLMs.
  8. This article introduces the LLM Function Design Pattern, a structured approach to building AI-powered software. It addresses the challenges of integrating Large Language Models (LLMs) into applications by outlining a pattern that promotes modularity, testability, and maintainability. The pattern involves defining clear functions with specific inputs and outputs, and then leveraging LLMs to implement the core logic within those functions.
    2025-10-03 Tags: , , by klotz
  9. This guide offers five essential tips for writing effective GitHub Copilot custom instructions, covering project overview, tech stack, coding guidelines, structure, and resources, to help developers get better code suggestions.
  10. This article discusses the concept of 'tool masking' as a way to optimize the interaction between LLMs and APIs, arguing that simply exposing all API functionality (as done by MCP) is inefficient and degrades performance. It proposes shaping the tool surface to match the specific use case, improving accuracy, cost, and latency.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: Tags: prompt engineering + large language models

About - Propulsed by SemanticScuttle